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Plant diseases cause significant productivity and economic losses, as well as 
a reduction in agricultural product quality and quantity. One principal impact 
on low crop yield is sickness due to bacteria, virus and fungus It is possible 
to avoid it by employing plant disease detection and categorization 
procedures. We used machine learning to detect and classify diseases in 
plant leaves because it evaluates data from several perspectives and 
categorizes it into one of several predefined classifications. In this research 
we create a model for the classification task which is sequential model. We 
trained a convolutional neural network (CNN) with help of the plant village 
dataset, which have 55,000 images divided into 39 completely distinct 
categories of each healthy and effected leaves. We trained data by using 
Adam optimization technique because it almost constantly plays quicker and 
higher global minimal convergence in comparison to the alternative 
optimization techniques. We achieved a validation accuracy of 98.74% using 


the architecture of CNN containing optimized parameters. CNNs, as can be 
observed, have a high-stop overall performance, making them surprisingly 
suitable for computerized identification of plant illnesses using simple plant 
leaf images. The experiment effects completed are similar with different 
current strategies in literature. 
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1. INTRODUCTION 

Harvest diseases, on the other hand, may appear to be a minor issue, but they have the ability to 
create famine and plant diseases are one the primary reason of worldwide food insecurity. In developing 
countries, where facilities and access to plant-disease control methods are restricted, the repercussions are 
more severe. A huge number of samples are examined to determine crop disease. Microscopy and DNA 
sequencing-based techniques that provide detailed data about pathogens that cause disease, like as micro- 
organisms, viruses, and fungal, among others [1]. However, the majority of farmers do not have access to 
these resources, access to these procedures of diagnosis. According to the World Bank's research, in 2016, 
mobile communication was widespread in 7 out of 10 of the world's poorest 20% of countries, while 45% of 
the world's population has internet access [2]. 2 Many challenges in agriculture and vegetation have been 
overcome as a result of recent breakthroughs and extensive research in the fields of deep learning and 
machine intelligence (AI), such as agricultural disease detection, yield identification, and smart farming. 
Rastogi [3] have noted, If an automatic system can detect the type of disease that a crop is suffering from, in 
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time pesticides or other relevant therapies may be supplied, which will aid agriculturally dependent countries. 
Farmers could be given new technology which can be like a smartphone app that can check a crop's health 
early on and alert them to the need for treatment. Our study goal is to develop a system solution that allows 
farmers to measure a crop's health. 

Diseases are changes in a plant's natural state that affect or disrupt critical functions including 
photosynthesis, transpiration, pollination, fertilization, and dissemination. Pathogens like fungus, bacteria, 
and viruses, and also unfavorable environmental conditions, cause these diseases [4]. As a result, detecting 
plant disease at an early stage is critical. Farmers require skilled monitoring on a constant basis, which can be 
excessively expensive and time-consuming. As a result, finding a quick, low-cost, and effective method to 
detect illnesses mechanically from symptoms on the plant leaf is critical. 

Here the proposed solution is described for detection of plant diseases and categorization. The 
solution is proposed based on pattern of the affected plant leaves and healthy leaves. We have used raw data 
records from the Plant village dataset which are not in the proper format. There are some repeated images, 
non-labeled images available in our dataset. These are not necessary for our research. We can keep them if 
we want but these repeated images and non-labeled records will cost more time to execute our model. For 
this reason, at first, we had to remove these non-labeled and repeated images. The data in our dataset is in 
picture format (JPG). The models, on the other hand, are ignorant to the photo format. As a result, 
classification is not possible. So, the solution was to design an algorithm which will convert the pictured 
formatted data into an integer format in sequence or pattern, which will be exactly like the text format so that 
the model could understand the different pattern or sequence of the converted integer form of data. We have 
also converted the integer formatted data into vector format so that it will take less time for our model to 
detect the pattern of the picture data of the dataset. Because when converts the integer to vector that time 
each vector point has been set for each text or strings. We use this vector value as a position of our pictures. 
So, when the model works it works with vector position of the pictures. In the next page, we are going to 
explain about our work process for this research. 

Al-Hiary [5] have showed that on a database containing 330 images of paddy leaves revealed that the 
proposed work produced with a test accuracy of 76.59%. The study used accuracy as the only metric for 
evaluating the K-nearest neighbor (KNN) classifier, rather than metrics such as precision, recall, or fl-score, 
which will be discussed in this paper. Suresha [6] employed a deep convolution neural network (CNN) model to 
examine the recognition of plant diseases, and the proposed work was shown to achieve an accuracy of 96.50%. 
To classify the various plant diseases, the study employs the well-known AlexNet architecture [7]. The AlexNet 
architecture is a neural network with eight layers of learnable features that is well-known for being used in most 
image categorization use cases. The photos from the plant village collection, which contains 54,323 
photographs of plant diseases and 38 different disease categories, were used in this investigation [8]. 

Hatuwal [9] investigated the detection of plant diseases using several machine learning models such 
as support vector machine (SVM), K-nearest neighbor (KNN), random forest classifier (RFC), and 
convolution neural network (CNN). The CNN model had the highest accuracy of all the machine learning 
models, with 97.89%, followed by the RFC with 87.436%, SVM with 78.61%, and finally KNN with 
76.969% [10]. Unlike earlier research, this one used precision, recall, and fl-score to evaluate its models; 
however, when it came to comparing the models, only the accuracy of all of them was used for deciding on 
the best performing model. 

In a study conducted by Agarwal [11], on the recognition of diseases in tomato leaves using 
convolutional neural networks, the proposed CNN model scored 91.2% accuracy, compared to pre-trained 
CNN models such as VGG16 (77.2%), MobileNet (63.75%), and finally the Inception model (63.34%). 
Three convolution layers and three max pooling layers make up the suggested CNN model in this study. This 
study further highlights the advantages of not utilizing a pre-trained model, finding that the proposed model 
required just 1.5 MB of storage space compared to 100 MB for pre-trained models [12]. proposed an artificial 
neural network (ANN) model for Phyllanthus Elegant Wall leaf disease categorization into two categories: 
healthy and unwell. They've changed the color scheme of the place. Image processing techniques were used 
to create herb plant images. The photos have been categorized. Depends on the leaf's color and size. As 
shown by Padol [13], the linear SVM was used for classifications of diseases on plant’s leaves. The input 
photographs of the berries and disease areas are subjected to preprocessing procedures. Clustering methods 
were used to recognize these, and color and texture information is collected from them. 

Mohanty [14] have showed, for the detection of leaf illness, the researchers used a convolution 
neural network (CNN). They used photos from a big dataset that included both healthy and diseased plant 
leaves. Rothe [15] tried out three different types of datasets: colored, they used grayscale, and also segmented 
data. This CNN model can quickly identify twenty-six diseases in fourteen different crops. For leaf disease 
identification, a convolution neural network (CNN) was deployed as mentioned on [16]. Madhuvaban [17] 
and Tm [18] also used photos from a huge dataset that has both healthy and affected leaves of various plants. 
To easily determine the classes of diseases, they also built a CNN based model. 
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2. METHOD 

Image processing and classification algorithms works are quite similar to each other. Our proposed 
methodology mainly has five divisions. The basic methodology of image detection is shown in Figure 1. 

a) Image acquisition: We collected the images from Plant village dataset. The dataset is free to use. In this 
part of process, the dataset is inputted and it is prepared to use for further work. 

b) Image processing: In this part, the images are processed for further working. Images are processed with 
various working methods like noise removing, resizing, color transformations, and enhancement [19]. 

c) Feature extraction: In this part of working, the images from dataset are converted into arrays as shown 
by Upadhyay [20]. Because the computer is not able to work with the images, we have to convert the 
images into machine readable format to work with the images as mentioned by Patil [8]. 

d) Classification: For Image classification we have used convolutional neural network and deep learning as 
shown by Ghadge [21]. We train our dataset in proposed model and then classifies the diseases with the 
models. 

e) Detection of diseases: For detecting diseases, we have taken some pictures which will be user given and 
then the model will show the output of the picture. 

In our model, we used convolutional layers to start the image detection and classification process. In 
our model, 2D convolutional layer is used. Con2D need numbers of filter to start its work. We use the 
sequential model that will use the Con2D layers various time to fulfill our research. Architecture of our 
proposed model is shown in Figure 2. 


Image Acquisition E = Image Processing ` Image Segmentation 


l 


Detection of Diseases k— Classification + Feature Extraction 


Figure 1. Basic methodology for image detection 
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Figure 2. Architecture of proposed model 


In our proposed model, we will start working on a convolutional layer with the size of 3X3 with 32 
filters. After that we will work with max pool layer of the size of 2X2. After that, we will work with the 
second convolutional layer with the size of 3X3 with 64 filter. After that we will work with the max pool 
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layer of the size of 2X2. Then the third layer is also convolutional layer of the size of 3X3 with 128 filter. In 


the third layer after applying convolutional layer, we again implemented a max pool layer of the size of 2X2 
each as shown by Tian [22]. Application process of 2D convolution is shown in Figure 3. 


2D convolutional operation 


Input 


Figure 3. Application process of 2D convolution 


2.1. Sequential model 

Machine learning models that input or output sequences of data are known as sequence models [23]. 
Textual streams, audio snippets, video clips, time - series data, and other sequential data are examples. In 
Keras, the simplest technique to build a model is sequential. It enables us to layer-by-layer construct a model. 
Each layer has weights that match the weights of the one above it [24]. Sequential Model’s working process 
is illustrated in Figure 4. 


Requirements 
specification 


Figure 4. Working process of sequential model 


2.2. Training the CNN model 

After making the model, we have to train the model. The dataset has to be trained with the help of 
proposed algorithms and using libraries of pythons. Diseases name of plant diseases and image quantity is 
presented in Table 1. Some of the plant leaves with different diseases are illustrated on Figure 5. Figure 5(a) 
shows a leaf infected by Apple Scab disease. A healthy apple leaf is shown in Figure 5(b). A healthy 
blueberry leaf is shown in Figure 5(c). Figures 5(d), 5(e), and 5(f) shows leaves affected with Corn 
Cercospora leafspot, Grape Black Rot, Potato Late Blight, respectively. The model is trained with some 
hyperparameters. These are epochs, steps, Ir (losing rate), batch size, width, height and depth. The 
hyperparameters are mentioned in Table 2. 
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Table 1. Diseases of plant disease dataset 


Disease name in Classes Images present 
Scrab on Apple Leaf 630 
Black Rot on Apple Leaf 621 
Cedar Apple Rust on Apple Leaf 275 
Apple Leaf Healthy 1645 
Blueberry Leaf Healthy 1502 
Powdery Mildew on Cherry 1052 
Cherry Leaf Healthy 854 
Grey Leaf Spot on Corn 513 
Common Rust on Corn Leaf 1192 
Northern Leaf Blight on Corn Leaf 985 
Corn Leaf Healthy 1162 
Black Rot on Grape Leaf 1180 
Black Measles on Grape Leaf 1383 
Leaf Blight on Grape Leaf 1076 
Grape Leaf Healthy 423 
Huanglongbing on Orange Leaf 5507 
Bacterial Spot-on Peach Leaf 2297 
Peach Leaf Healthy 360 
Bacterial Spot-on Pepper Leaf 997 
Pepper Leaf Healthy 1478 
Early Blight on Potato Leaf 1000 
Potato Leaf Healthy 152 
Late Blight on Potato Leaf 1000 
Raspberry Leaf Healthy 371 
Soybean Leaf Healthy 5090 
Powdery Mildew on Squash Leaf 1835 
Strawberry Leaf Healthy 456 
Leaf Scorch on Strawberry Leaf 1109 
Bacterial Spot-on Tomato Leaf 2127 
Early Blight on Tomato Leaf 1000 
Tomato Leaf Healthy- 1591 
Late Blight on Tomato Leaf 1909 
Leaf Mold on Tomato Leaf 952 
Septoria Leaf Spot on Tomato Leaf 1771 
Two Spotted Spider Mites on Tomato Leaf 1676 
Target Spot on Tomato Leaf 1404 
Mosaic Virus on Tomato Leaf 373 
Yellow Leaf Curl Virus on Tomato Leaf 5357 


Figure 5. Deaseased plant leaves: (a) Apple_scab, (b) Apple_Healthy, (c) Bluberry_healthy, (d) Corn (maize) Cercospora 
leafspot Gray_leafspot, (e) Grape_Black_Rot, and (f) Potato_Late_Blight 
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Table 2. Hyperparameters for testing 
Hyperparameter Values 


EPOCH 25 
STEPS 100 
LR le-3 
BATCH_SIZE 32 
WIDTH 256 
HEIGHT 256 
DEPTH 3 


3. RESULT AND DISCUSSION 

The proposed model was created with the use of more than 30% data (images). As we have created the 
sequential model. Then adam optimizer is used to train our model. To train our model we have used 30% data 
records and we have passed this data records forward and backward through the neural network by our model 
for 25 times. For the very first attempt we have found an accuracy rate of 94%, for the second approach we have 
found an accuracy rate of 97%, and for the rest of the attempts we have found an accuracy rate of 98.7%. From 
this result we have realized that the more we will use this model the more accuracy rate we will have. As we 
have found more accuracy rate by using sequential model, so we have used sequential model to test our project 
and predict the plant diseases in leaves. And we have found the result accurate for 98.7% cases. If we give any 
picture of plant leaves as an input to our model which is relevant to our dataset, then the model shows the result 
as the class name of the disease. As we have read many papers, from them we have compared with most 
efficient ones, [25], [26]. Rauf [25] have noted, they have used CNN algorithm but their accuracy is less than 
us. Table 3 shows the comparison of model accuracy of [25], [26] and our proposed model. 

The training and validation accuracy and training and validation loss is plotted as graphs with the help 
of matplotlib and showed on Figure 6. Figure 6(a) depict the training accuracy vs validation accuracy and 
Figure 6(b) shows the training loss Vs validation loss. These two figures are about the model accuracy and loss. 


Table 3. Comparison is made between results 
CNN's Structure Accuracy Loss 
Described Model in [25] 91.24% 0.4997 
Described Model in [26] 96.47% 0.2487 
Model in ours 98.74% 0.1266 


Training and Validation accurarcy Training and Validation loss 


0.35 —— Faining loss 

—— Validation loss 
0.30 4 
0.257 
0.20 
0.154 


0.107 


—— Faining accurarcy 0.05 4 
0.95 1 —— Validation accurarcy 
T T T T T T T T T T T 
0 5 10 15 20 25 0 5 10 15 20 25 
(a) (b) 


Figure 6. Validation graph: (a) training accuracy Vs validation accuracy graph and (b) training loss Vs validation loss 


4. CONCLUSION 

The task of detecting and classifying plant diseases in leaves manually is a hard and complicated 
job. It takes a lot of time, effort and manpower. In this research, we have discussed about detecting and 
classifying plant diseases in leaves by help of machine learning approach. Without shadow of doubt, we have 
understood that if we use ML algorithms for identification and categorization of diseases on plant’s leaves, it 
will save our time, efforts, and besides it’ll give us more accurate result. We have used a dataset which is 
Plant village dataset and it is a public dataset, which means it is full free to use. We applied different 
algorithms, models, classifier to find out the best possible result we can get by using the proper use of our 
dataset. We have found out that sequential model works best in our research procedure. The primary 
objective of this research is to discover plant diseases in leaves and also make sure to show the result as true 
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if the plant leave has no disease. If an individual or group of people is confused about a plant disease, they 
can use the model and can predict the plant disease. As plant diseases in leaves is a big issue of headache of 
farmers so our aim is to make a model that can detect and classify the plant disease. 

There are some limitations of our research. The plant disease issue is a new arrived issue in machine 
learning. So, there are only few datasets available to work with. If we have more dataset then our models will 
work better. As we have image dataset so we have to work with images of plant leaves. But there could be 
other plant diseases which can be happened in plant body. We couldn’t work with plant diseases of plant 
body as there is very few and non-efficient dataset available to work with. So, our model will not be able to 
work with plant diseases in plant body. And as we have almost fifty-five thousand data, these features can 
increase the calculation complexity as well as a huge amount of storage device for saving those source code 
for calculation in the computer’s main memory. For more testing, our model takes more time. Plant disease 
detection and classification has numerous outstanding difficulties that researchers must address. For example, 
in order to limit the damages caused by plant diseases, finding essential factors engaged in the growth of 
plant diseases is necessary. A dataset with a huge amount of plant diseases in leaves records will be able to 
play a big role to and classify more accurately then the past or present. To identify the primary sources 
involved in the spread of plant diseases, graph theory and machine learning approaches can be used. 
Likewise, real time plant disease detection and classification might also be a future area. 
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